JMIR Formative Research
◐ JMIR Publications Inc.
Preprints posted in the last 7 days, ranked by how well they match JMIR Formative Research's content profile, based on 32 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.
Bergson, Z.; Vassall, S. G.; Wright, A.; McCoy, A. B.; Schafer, K. M.; Achee, M. C.; Sheffield, J. M.
Show abstract
Background: Concerns about "AI psychosis" have swirled in the media since ChatGPT's release, but few systematic analyses exist. We therefore conducted an electronic health record (EHR) analysis to identify the frequency, clinical characteristics, and quality of AI interactions in patients experiencing psychosis treated in a medical center. Methods: AI keywords (e.g., ChatGPT, AI) were used to search Vanderbilt University Medical Center's EHR from 12/1/2022-4/1/2026. Records were discarded if they were not AI-related or if the primary diagnosis did not include psychosis. Three raters read notes to determine if a patient was experiencing AI psychosis and classified the interactions using 4 a-priori categories (Catalyst, Amplifier, Co-Author, Object) formulated to explain how AI-related negative outcomes emerge. Findings: 73 patients met our criteria. 28 patients were rated as experiencing AI psychosis, 17 had neutral interactions, and 28 expressed delusional content related to AI without documented evidence of conversational AI use. ChatGPT was the matching keyword for 53.6% patients experiencing AI psychosis. The majority of AI psychosis cases were documented after ChatGPT's "4o" model was released in May 2024. Notably, the AI Psychosis group had significantly more patients experiencing a first psychotic episode (60.7%) compared to the other two groups. Amplifier was the most common (64.3%) qualitative rating in the AI Psychosis group. Interpretation: "AI psychosis" is an infrequent but real phenomenon observed in clinical practice. Most affected patients were experiencing their first psychotic episode and presented with AI psychosis following the release of the more sycophantic GPT-4o. Among the affected patients, AI most often exacerbated an existing condition by reinforcing distorted ideas.
Dobbins, D.; Russell, A.; Gunther, M.; Shetty, V.; Shomali, A.; Vawdrey, D.; Waring, S.; Whary, P.; Wong, J.; Wright, E. A.; Olson, A. W.
Show abstract
Objectives: Older adults with comorbidities and polypharmacy have disproportionately high risk of hospitalization as well as readmission from adverse drug events (ADEs), of which 28%-71% are preventable (pADEs). This paper introduces an LLM application, CommunicADE, designed to support risk-mitigation of pADE-related readmission for the aforementioned population. We aim to evaluate CommunicADE's technical performance with OpenAI's HealthBench criteria: accuracy, completeness, communication quality, context awareness, and instruction following. Materials and Methods: Our technical validation study used an LLM (KimiK2.5) to simulate interviews between CommunicADE and nine high-fidelity synthetic patients hospitalized and at increased risk for pADE-related readmission (65+ years, comorbidities, 5+ medications). Some pADE risk mechanisms clues were visible to CommunicADE in patient H&Ps, but most mechanisms were solely discoverable in interviews. Two pharmacists evaluated CommunicADE's interview questions and EHR notes with HealthBench-informed variables. Analyzes used descriptive statistics. Results: For 35 mechanisms across 9 patients (avg=3.89 mechanisms/patient), CommunicADE's precision and recall were 0.92 and 0.63, respectively. Hallucinations were absent. Coherence and person-centeredness scored 4.28 and 4.44 on a 5-point scale (5=highest). On average, communication was at a 5th grade level and objective for 78% of patients. Most patient-reported quotes included in notes (92%) supported detected mechanisms. CommunicADE followed all instructions regarding interview length and patient approvals. Discussion: CommunicADE's strongest performance was in accuracy (precision, hallucinations), communication quality (coherence, readability), context awareness (person-centeredness). Completeness (recall) and instruction following (objectivity, pADE mechanism/quote alignment) show room for improvement. Conclusion: Findings suggest technical readiness for a feasibility pilot with real-world patients, and key areas for performance improvement.
Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.
Show abstract
Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.
Wellman, A.; Messineo, L.; Azarbarzin, A.; Esmaeili, N.; Aishah, A.; Vena, D.; Sumner, J.; White, D.; Sands, S.
Show abstract
Objective: Several endotypes contribute to the development of Obstructive Sleep Apnea (OSA). However, efforts to measure these endotypes have been challenging. In this paper, we propose a new method that overcomes some of these challenges. Methods: To test the feasibility of this new method, data from the Sleep Heart Health Study (SHHS) were analyzed and two oxygen-based endotypes were identified and plotted on a graphical model: the steady-state SpO2 and the SpO2 arousal threshold. The first is the oxygen saturation that would occur during sleep if there were no arousals, and it is a measure of upper airway collapsibility (a more collapsible airway produces a lower SpO2). The latter is the oxygen saturation that triggers arousals. These endotypes were validated by assessing their ability to detect positional and state-related changes in airway collapsibility and arousal threshold. Results: The study showed that it was feasible to measure oxygen-based endotypes in 95% of SHHS participants. As expected, steady-state SpO2 was lower during supine vs. non-supine sleep, as well as during REM vs. NREM sleep. Also, the SpO2 arousal threshold was similar between supine and non-supine sleep. However, SpO2 arousal threshold was not lower in REM sleep vs. NREM sleep. Therefore, in 3 of the 4 conditions, the oxygen-based endotypes moved in the expected direction due to positional or sleep state changes. Conclusion: Although further validation experiments are required, this study indicates that OSA endotyping using the pulse oximetry signal is feasible. The oxygen-based endotypes could be used to aid therapeutic decision making.
Bedwell, G. J.; Madden, V. J.; Isaacs, A.; Khorommbi, H.; Moloi, N.; Papaioannou, G.; Solomons, S.; Sudan, S.; Parker, R.
Show abstract
Introduction Dysmenorrhoea is highly prevalent globally and interferes with engagement in education, work, social participation, and quality of life. Although evidence suggests that sociocultural beliefs influence how menstrual pain is understood and managed, relatively little research has explored dysmenorrhoea-related knowledge and beliefs within South Africa. This study aimed to (1) determine the frequency of dysmenorrhoea, (2) assess dysmenorrhoea-related knowledge and compare knowledge between menstruating and non-menstruating individuals, and (3) explore commonly held generational, cultural, and religious beliefs related to dysmenorrhoea in a South African university cohort. Methods We analysed data collected as part of a cross-sectional survey conducted among staff and students at a South African university. Participants completed demographic questions, items assessing dysmenorrhoea-related knowledge, and an adapted Working Ability, Location, Intensity, Days of Pain, Dysmenorrhoea (WaLIDD) questionnaire. Participants were also invited to provide free-text responses describing generational, cultural, and religious beliefs about dysmenorrhoea. Quantitative data were analysed descriptively and compared between menstruating and non-menstruating participants. Free-text responses were analysed using reflexive thematic analysis. Results A total of 863 participants completed the survey, including 578 current or past menstruators. The frequency (95%CI) of dysmenorrhoea was 75.4% (71.7-78.9). Most participants were classified as having moderate (53%) or severe (31%) dysmenorrhoea on the WaLIDD scale. Awareness of dysmenorrhoea was higher among participants who had menstruated than among those who had never menstruated (80.4% vs 55.3%, p<0.001). Most participants (85.1%) reported wanting more education about dysmenorrhoea and its impact. Reflexive thematic analysis of 246 free-text responses identified five themes: (1) menstrual pain is normalised, dismissed, and expected to endure, (2) reproductive meanings attached to menstrual pain, (3) moral, spiritual, and cultural interpretations of menstrual pain, (4) negotiating competing explanations for menstrual pain, and (5) managing and controlling menstrual pain symptoms. Across themes, dysmenorrhoea was interpreted through social, cultural, reproductive, spiritual, and biomedical frameworks that shaped how pain was understood, communicated, and managed. Conclusion Dysmenorrhoea is common in this South African university cohort, and is rarely understood as a purely biological symptom. Instead, menstrual pain is understood and managed through broader social, cultural, reproductive, moral, and biomedical narratives, which shape how pain is recognised, disclosed, legitimised, and treated. These findings highlight the importance of considering sociocultural beliefs alongside clinical factors when developing menstrual health education, support strategies, and healthcare services.
Hudson, G. R.; Khan, D. Z.; Fayez, F.; Bhatia, S.; Bano, S.; Costanza, E.; Blandford, A.; Stoyanov, D.; McCulloch, P.; Marcus, H. J.; University College London Collaborators,
Show abstract
Background: Endoscopic endonasal transsphenoidal surgery (EETS) requires navigation around neurocritical anatomy. Today, artificial intelligence clinical decision support systems (AI-CDSSs) can orientate surgeons, but clinician trust in AI remains unclear, limiting safe deployment. This study evaluates how modifiable design affects trust and performance in a real-world pituitary surgery AI-CDSS. Method: Online, 70 clinicians with pituitary surgery experience were randomised evenly to a Basic or Enhanced AI-CDSS which outline the sella on EETS operative video. The Enhanced group additionally received explanation of the model and previous publications, alongside confidence labels depicting outline reliability. Both groups annotated the sella on six video clips, first alone then with the optional AI-CDSS. Clips were ordered by declining AI performance, except for the final clip. Self-reported trust was measured using a 1-7 scale after each annotation, and performance was the DICE overlap between user annotations and the ground truth. Comparisons used Mann-Whitney U and permutation analysis. Results: Sixty-four participants (91%) finished the exercise (31 Basic, 33 Enhanced). When AI performed best, median trust was 5.00 in both arms (U=559, p=.521). However, when AI performed worst, trust was significantly lower for the Enhanced group (3.00 vs 3.67, U=668, p=.035), sustained in the final clip (3.67 vs 4.33 U=687, p=.019). User performance improved with the AI-CDSS, but with no significant difference between the groups on the best or worst AI performing clips. Nevertheless, for the best AI, senior clinicians had higher median performance in the Enhanced group (0.95 vs 0.90, U=75, p=.066). There was also less dispersion in the Enhanced group when AI was inaccurate (IQR: 0.07 vs 0.21, p=.004). Conclusion: Interface design can improve trust calibration in a surgical AI-CDSS and may increment performance in seniors when AI is accurate, and consistency when AI is inaccurate. In future, these features may form important safety checks during translation to the operating room.
Odeny, T. A.; Adhiambo, H. F.; Mangale, D.; Makanga, P. K.; Odeny, B.; Okuku, F.; Zhou, C.; Geng, E.; Carson, J.; Mudhune, V.; Bukusi, E.; Semeere, A.
Show abstract
Abstract Background: Kaposi sarcoma (KS) is the most common cancer among men in several Eastern African countries, yet treatment monitoring relies on imprecise, time-consuming ruler-based measurements defined by the AIDS Clinical Trial Group (ACTG). This method suffers from inter-observer variability, fails to capture lesion height or true geometric area, and performs poorly on dark skin. SkinScan3D (SS3D) is a portable, low-cost, AI-enabled 3D imaging device that provides objective measurements of KS skin lesion area, height, volume, and color. The Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS) study evaluates whether SS3D provides more reproducible and accurate lesion measurements than the standard method, and validates its integration into routine clinical workflows in Kenya and Uganda. Methods: PRIME-KS is a multicountry prospective mixed-methods study with two clinical objectives. Objective 1 is a cross-sectional diagnostic accuracy study comparing SS3D with ruler-based measurement in 50 adults with KS (150 lesions) across sites in Kenya and Uganda. Two clinicians independently measure three lesions per participant using both methods. The primary outcomes are concordance correlation coefficient (CCC) for inter-rater reproducibility, and co-efficient of determination for accuracy. Objective 2 is a non-randomized before-and-after pilot study in 100 patients at three sites, evaluating device usability, acceptability, appropriateness, and feasibility using validated instruments, along with time-and-motion studies and activity-based micro-costing. Prior to these clinical objectives, a formative study used focus group discussions, discrete choice experiments, and human-centered design workshops to refine the SS3D device and protocols with end-user input. Discussion: PRIME-KS will provide the first rigorous evaluation of a 3D imaging device for monitoring KS treatment response in routine clinical settings. If SS3D demonstrates superior reproducibility and clinical utility, it could reduce unnecessary chemotherapy exposure and associated toxicities by enabling earlier, more objective assessment of treatment response. Trial registration: ClinicalTrials.gov NCT06898203, registered 27 March 2025. Pan African Clinical Trials Registry PACTR202603523439856. Keywords Kaposi sarcoma, SkinScan3D, 3D imaging, treatment monitoring, diagnostic accuracy, implementation science, usability, human-centered design, Kenya, Uganda
Kasaju, M.; Shrestha, A. P.; Oli, N.; Vaidya, A.
Show abstract
Introduction: Cardiovascular diseases (CVDs) are the leading cause for death and disability worldwide accounting for 75% of deaths in low- and middle-income countries (LMICs) like Nepal. Urbanization and globalization remains the major cause of rise in CVDs among urban poor population along with growth in slum settlements. This study aims to assess the knowledge, attitude and practice (KAP) of CVDs and its risk factors among women of one such urban poor community in Nepal. Methodology: This cross-sectional study (n=388) in the Sinamangal-Minbhawan slum area was conducted using semi structured questionnaire based on STEPs survey and HARDIC study among the participants selected through convenient sampling. Descriptive analysis was done using SPSS version 21 and KAP scores were further categorized based on median score to perform multivariate logistic analysis. Additionally, Anthropometric and blood pressure measurements were also recorded and analyzed. Results: The median age (Interquartile range) of participants was 33 years (17) with majority of them being Dalit by ethnicity, housewives, with up to primary level education belonging to upper lower socioeconomic class. More than half (53.3%) of the participants were obese and over 23% were hypertensive. While half of the hypertensive women were aware of their status, only 3% had their blood pressure under control.The median knowledge, attitude and practice (KAP) scores were 12, 60 and 10 respectively. The KAP scores were positively associated with socioeconomic status of the participants. Conclusion: The study revealed low knowledge with high prevalence of behavioral risk factors of CVDs along with high prevalence of other metabolic risk factors like high body mass index, high waist hip ratio and hypertension among women of slum area with a positive attitude to prevent CVDs and its risk factors.
Sood, E.; Canter, K.; Arasteh, K.; Kazak, A. E.
Show abstract
Background: Maternal mental health problems are common after prenatal diagnosis of congenital heart disease (CHD), with long-term implications for child and family wellbeing. HEARTPrep is a prenatal psychosocial intervention with three self-paced modules and corresponding telehealth sessions, delivered during pregnancy via mobile app to improve mental health and wellbeing for mothers expecting a baby with CHD. This proof-of-concept study evaluated the feasibility of HEARTPrep and examined maternal mental health and psychosocial functioning throughout participation. Methods: Participants were mothers receiving care for a fetal CHD diagnosis within one health system. Feasibility was assessed via rates of enrollment and completion. Mothers completed 4-item PROMIS questionnaires assessing anxiety, depression, and social isolation and reported self-efficacy and hope on a weekly basis throughout HEARTPrep. Results: Of 34 recruited mothers, 29 (85%) enrolled and two were subsequently not eligible (delivery prior to participation, change in fetal diagnosis), resulting in a final sample of 27 mothers. The majority (n = 22, 81%) completed all three telehealth sessions and Modules 1 (n = 22, 81%) and 2 (n = 19, 70%), with just over half (n = 14, 52%) completing Module 3 prior to delivery. Mean PROMIS depression T-scores decreased from 57.5 to 52.9, and 48% of mothers had a decrease in depression scores exceeding the meaningful change threshold (half standard deviation). The percentage of mothers reporting high self-efficacy increased from 19% to 48%. Conclusions: HEARTPrep is feasible and corresponds with reduced maternal depression and increased self-efficacy, supporting proof-of-concept. A randomized controlled trial is needed to determine whether HEARTPrep improves outcomes compared to a control group.
Bond, J.; O'Connel, N.; Wand, B.; Chalmers, J.; Kal, E.
Show abstract
Chronic pelvic pain (CPP) affects up to 26% of women worldwide. While its pathophysiology is poorly understood, disturbances in body perception have been identified in various similar chronic musculoskeletal disorders. The Fremantle Perineal Awareness Questionnaire (FrePAQ) is a novel tool designed to specifically assess disturbed body perception in the pelvic region, but its structural validity and reliability require formal evaluation. Methods: Patient partners with lived experience contributed to study design. Participants with (n=417 and without (n=277) chronic pelvic pain completed the FrePAQ at baseline, as well as one week later. We assessed the validity and reliability of the FrePAQ following COSMIN guidelines for Classical Test Theory. Results: The validated FrePAQ comprises a two factor model, with a six item Distress & Disconnection (D&D) subscale and a two item Size & Shape (S&S) subscale. Confirmatory analysis showed excellent fit (CFI = .988; RMSEA = .048) and measurement invariance between diagnostic groups. Internal consistency was high (cronbach alpha = .838 CPP, .819 controls). Test retest reliability was high for D&D (ICC = .863) and acceptable for S&S (ICC = .695). FrePAQ scores showed a weak to moderate correlation with pain scores (r = .234 to .255), psychological distress (r = .226 to .443), and functional impact (r = .172 to .295), particularly for the D&D subscale. Conclusion: The FrePAQ is a reliable and valid instrument to measure perineal perceptual disturbances in CPP. Future research will evaluate the tools potential to support phenotyping and guide individualised interventions. Improved understanding of body perception disturbance in CPP can enhance diagnosis and treatment precision.
Shah, K. P.; Airan Javia, S.; Savage, T.; Bressman, E.
Show abstract
End-of-rotation handoffs are critical for patient safety but add to documentation burden for hospitalists. Generative artificial intelligence (AI) may help automate handoff creation using electronic health record data, but its impact on quality and safety is unclear. Methods: We developed an AI handoff tool with a large language model using clinical notes as input and conducted a retrospective evaluation comparing AI-generated and clinician-authored handoffs. Handoffs were assessed across domains of quality and safety through a structured review. Results: Quality ratings were similar between AI and human handoffs (3.7 vs. 3.5, p=0.57). AI-generated handoffs were rated higher for organization (4.4 vs. 4.1, p=0.05) and completeness (4.1 vs. 3.6, p=0.01), but lower for conciseness (3.7 vs. 4.1, p=0.03) and accuracy (4.1 vs. 4.4, p=0.03). Error rates were comparable (0.3/handoff in both groups); however, AI-generated handoffs included inaccuracies (9% of AI errors) and hallucinations (1% of AI errors), while clinician-authored handoffs contained only omissions. Conclusion: Human and AI handoffs have differing error profiles and tradeoffs between completeness and conciseness. Prospective evaluation in clinical workflows is underway.
Ogunsemoyin, O.; Fayehun, O.
Show abstract
Introduction: Stroke care is time-sensitive, yet patients in low-resource settings may reach tertiary services only after passing through multiple formal and informal care options. This study examined documented care-seeking pathways and time to presentation among stroke cases recorded at the University of Medical Sciences Teaching Hospital (UNIMEDTH), Ondo State, Nigeria. Methods: A retrospective hospital record review was conducted using secondary data from the Stroke Registry, radiology department records, referral notes, and ambulance records at UNIMEDTH. The analysis included 371 stroke cases with documented time from symptom onset to UNIMEDTH presentation and reconstructable care pathways. First-contact routes were classified as hospital/biomedical, self/informal or traditional/faith-based care, and the number of documented steps defined pathway complexity before and including tertiary presentation. Frequencies and percentages described pathway patterns; median presentation times were compared using Mann-Whitney U and Kruskal-Wallis tests. Results: The median time to tertiary presentation was 24 hours (interquartile range [IQR] 9-72), and 317 patients (85.4%) presented after four hours. Only 30 patients (8.1%) presented directly to UNIMEDTH; 44 distinct care-pathway sequences were recorded. Hospital-facility first contact was documented for 81 patients (21.8%). It was associated with a median presentation time of 3 hours (IQR 2-6), compared with 48 hours (IQR 24-72) among patients whose initial contact was outside a hospital facility (U = 699.50, p < 0.001). The median time also differed across grouped first-contact categories and pathway complexity levels (both p < 0.001). Conclusion: Non-hospital or multi-step care-seeking pathways commonly preceded tertiary stroke presentations in this setting. The findings indicate that delayed tertiary arrival is partly embedded in the pathway followed after symptom onset. Interventions should combine public recognition of stroke warning signs with urgent referral linkages involving hospitals, patent medicine vendors, traditional and faith-based providers, and emergency transport systems.
Ogunsemoyin, O.; Fayehun, O.
Show abstract
Introduction: Early hospital presentation after stroke onset is necessary for rapid assessment and access to time-dependent acute management. This study examined the correlates of late presentation for stroke care among patients recorded at a tertiary hospital in Ondo State, Nigeria. Methods: A retrospective records review was conducted using secondary data from the Stroke Registry of the University of Medical Sciences Teaching Hospital, radiology department records, referral notes, and ambulance records. Records of stroke cases documented within the preceding 24 months were reviewed. Late presentation was defined as hospital presentation more than four hours after symptom onset. Frequencies, chi-square tests, and modified Poisson regression with robust standard errors were used to estimate adjusted prevalence ratios. Results: The analysis included 371 stroke cases. Of these, 317 (85.4%) presented after four hours, and the median time to presentation was 24 hours (interquartile range: 9-72 hours). Late presentation differed significantly by employment status, first-contact route, and pathway complexity at bivariate analysis. After adjustment, non-hospital first contact remained strongly associated with late presentation: patients whose first documented contact was non-hospital-based had almost 3 times the prevalence of delay compared with those whose first contact was hospital-based (adjusted prevalence ratio = 2.89; 95% confidence interval: 2.15-3.90; p < 0.001). Conclusion: Late presentation was pervasive in this tertiary hospital record cohort and was primarily associated with the initial direction of care-seeking. Stroke response interventions should emphasise immediate hospital presentation and strengthen urgent referral from non-hospital first-contact points.
Zhao, Y.; Yun, Y.; Bai, T.; Xiong, L.; Ruan, Y.; Zhao, H.; Wang, W.; Wang, F.
Show abstract
Abstract Objective: The onset of hypertension occurs at a younger age in China, and the relationship between health literacy and quality of life among middle-aged and older hypertensive patients remains unclear. This study explored whether perceived social support and self-efficacy mediate the association between health literacy and quality of life in middle-aged and older hypertensive patients. Methods: A questionnaire was administered to 1,015 middle-aged and older hypertensive adults from communities in six central provinces of China. The EQ-5D scale, Perceived Social Support (PSS) scale, Self-Efficacy Scale (SES), and Health Literacy Scale (HLS) were used to assess quality of life, social support, self-efficacy, and health literacy, respectively. Mplus 8.3 software was used to construct a structural equation model for path analysis. Results: The mean PSS, SES, HLS, EQ-5D, and EQ-VAS scores were 15.57{+/-}3.45, 10.61{+/-}2.41, 9.49{+/-}2.86, 0.88{+/-}0.18, and 71.06{+/-}17.49, respectively. Health literacy and quality of life scores significantly differed among middle-aged and older hypertensive patients, and both showed positive correlations with perceived social support and self-efficacy (both P<0.001). Perceived social support and self-efficacy exhibited a chain mediated effect on the relationship between health literacy and quality of life (EQ-5D utility index and EQ-VAS), accounting for 28.57% of the total effect of the EQ-5D utility index and 27.26% of that of the EQ-VAS. This study is the first to elucidate the mechanism by which health literacy influences quality of life in middle-aged and older hypertensive patients through the chain-mediated effect of perceived social support and self-efficacy. Conclusion : Health literacy is significantly correlated with quality of life in middle-aged and older hypertensive patients. This correlation can directly or indirectly explain the impact on quality of life through mediating pathways involving perceived social support and self-efficacy. Keywords: hypertensive patients, perceived social support, self-efficacy, health literacy, quality of life, mediating effect
Khan, D. Z.; Mao, Z.; Hudson, G.; Wijekoon, A.; Chen, J.-e.; Borg, A.; Dorward, N.; Blandford, A.; Clarkson, M.; McCulloch, P.; Bano, S.; Stoyanov, D.; Marcus, H.
Show abstract
Background Endoscopic pituitary surgery involves navigating high-stakes anatomy where complications, such as carotid artery injury, cause devastating morbidity. While computer vision AI offers potential for real-time anatomical recognition to mitigate these risks, successful translation requires rigorous human-factors and performance evaluation. We present the iterative development and preclinical evaluation of a surgeon-controlled, real-time AI-assisted navigation system. Methods Guided by IDEAL Stage 0 and DECIDE-AI frameworks, the study was conducted in two phases. Phase 1 was an exploratory study where surgeons used the system during high-fidelity simulated surgery and provided feedback via "Think Aloud" protocols and surveys. Following prototype iteration, a Phase 2 randomized crossover comparative trial was conducted with 19 neurosurgeons (15 trainees, 4 experts) performing high-fidelity simulated tumour resections with and without AI assistance, separated by a minimum 2-week washout. The primary outcome was surgical technical performance (OSATS). Workload, educational value, usability, trust, and implementation outcomes were also assessed. Results Phase 1 informed hardware, model, and interface refinements, including optimized pedal-controlled overlays and prediction confidence metrics. In the comparative trial, AI assistance significantly improved overall technical performance (OSATS 19.79+/-4.06 vs. 17.32+/-4.11; p=0.027). This gain was experience-dependent; AI significantly augmented trainee performance (19.20+/-3.76 vs. 16.60+/-3.78), narrowing the proficiency gap, while expert performance remained high and stable. 100% of participants identified the system as a useful training tool. However, subjective workload was significantly higher in the AI arm (SURG-TLX 26.42+/-9.56 vs. 22.26+/-7.81; p=0.014). Despite this, usability (SUS 75.13+/-14.31) and implementation feasibility, acceptability, and appropriateness scores were consistently high (means >4.4/5). Conclusions This study provides a stepwise process for real-time AI development using pituitary surgery as a high-stakes exemplar. The refined surgeon-centric AI system improves training and technical performance, particularly for trainees. Next steps involve first-in-human studies and further exploration of longer-term human factors such as over-reliance, cognitive overload mitigation and trust calibration.
Charfeddine, N.; Schranz, M.; Schlump, C.; Rupprecht, M.; Ullrich, A.; Diercke, M.; AKTIN Research Group, ; Estupinan Mendez, J.
Show abstract
Background: Mass gathering events (MGEs) are associated with several public health challenges and may cause a strain on healthcare services. Literature findings on the impact of MGEs on emergency departments (EDs) are heterogeneous. Objectives: To examine shifts in ED attendance characteristics during a major sporting tournament, namely the UEFA European Football Championship 2024 held in Germany. Methods: We conducted a retrospective observational study using ED data from the Emergency Department Data Registry. We compared baseline ED attendance characteristics between the tournament and the reference period, defined as two weeks before and two weeks after the tournament, and between Germany game days and non-Germany game days. Hourly attendance patterns were analysed for all Germany games using a reference range. Results: We included data from 41 EDs, totalling 253,493 attendances during the study period. A 1.57% increase in attendance was observed during the tournament compared to the reference period, with baseline characteristics remaining similar. The median daily attendance within all EDs was slightly lower on Germany game days (4066) compared to non-Germany game days (4128). Modest changes were observed in the hourly attendance on Germany game days, most notable during the last Germany game where a decrease in attendance below the reference range extended over three hours. Conclusions: The observed shifts in ED attendance were minimal, suggesting that no major changes of public health relevance occurred in ED attendance during the tournament. We highlight the utility of using ED data for monitoring and for enhancing the understanding of the public health risks and challenges associated with MGEs.
Garavito Jimenez, D. A.; Bello Angulo, D. E.; Mejia Lemus, L. T.; Chipatecua, D.; Fula, D. D.; Perez-Rubiano, S.; Martinez, F. L.; Bohorquez Pinzon, J. C.
Show abstract
Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS -- Registro Background Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded RIPS records (FEV-RIPS) as the standard for financial and clinical data exchange. ADRES -- the entity responsible for administering the resources of Colombia's General Social Security Health System -- faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers. Health systems in high-income countries converge clinical-financial data in consolidated platforms; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and no centralized analytical infrastructure until 2023. Objective We describe the design, technical challenges of integrating heterogeneous data, and operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates. Methods Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January-November 2025. We report indicators of data volume, processing speed, computational capacity, concurrent use by functional group, and governance structure. The architecture integrates VPN connectivity with MinSalud, automated processing of multiple formats (XML, relational tables, flat files), and a medallion data lake (Bronze/Silver/Gold). Data quality challenges include structural inconsistencies across sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and absent technical documentation. Results The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds using clusters of up to 32 TB RAM and 4,096 vCPU. During September-October 2025, monthly query peaks reached 78,028 across eleven functional groups. Integration required Python/PySpark parsers for variable-depth XML, equivalence tables for incompatible municipality codes, cleaning routines for extreme dates used as nulls (1900-01-01, 9999-12-31), and transformation logic bridging classic RIPS and FEV-RIPS. The platform supported econometric analyses, judicial mandate responses, and public interactive dashboards. Conversational AI integration (Genie, Copilot) extends analytical access to users without SQL knowledge. Conclusions ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. Centralizing health system information at national scale is technically feasible under public institutional constraints -- but requires solving cross-source standardization problems the implementation literature does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges.
Musholt, T. J.; Clerici, T.; Bergenfelz, A.; Schmidt, C. O.; Struckmann, S.
Show abstract
Background: Medical registries have gained importance in the evaluation of healthcare quality outcomes. In the absence of high-quality evidence, such as randomized controlled trials, studies based on registry data are essential for informing clinical guidelines. Methods for assessing data quality are rarely described in detail. To ensure the credibility of registry-based studies, registries must use all available technical and operational means to guarantee high data quality. Method: Eurocrine(R) is a pan-European endocrine surgical database and quality registry initially funded by the EU healthcare programme, which started in 2015 and now includes more than 200,000 interventions as of April 2025. To ensure high data quality, interactive and standardized reports are created via Microsoft Power BI, which are created both centrally and locally. In addition, comprehensive data quality analyses were performed via the R-based package dataquieR. Results: Although a multitude of technical measures (for example, input screen design and real-time plausibility checks during data entry) are in place, they are not sufficient to prevent human errors at data entry. Errors identified in the reports were corrected, and preventive measures were implemented. Overall, the data quality was assessed as very good in terms of completeness, accuracy, and consistency. Conclusion: It is very important to provide registry users with an efficient and smart tool to identify data issues, as they have the clinical information to correct them. Data quality reports generated with dataquieR represent an effective tool for registry administrators. Predesigned Microsoft Power BI reports enable participating Eurocrine(R) clinics to self-audit their data.
Benning, L.; Hirsch, A.; Groeschel, M.; Roeschl, T.; Spott, M.; Hans, F. P.; Urban, T.; Busch, H.-J.; Meyer, A.; Madrid, J.
Show abstract
Background Emergency department (ED) triage is a high-stakes clinical decision process that determines patient prioritization and resource allocation under time pressure. Large language models (LLMs) have recently been proposed as decision-support tools for triage, yet most evaluations rely on simulated scenarios or curated datasets. Evidence from real-world clinical environments remains limited. The objective of this project was to systematically evaluate the performance, calibration, and reproducibility of multiple contemporary large language models for Emergency Severity Index (ESI) classification and sectoral allocation (ED vs. urgent care practice, UCP) using a comprehensive real-world triage dataset. Material and Methods Retrospective cross-sectional benchmarking study conducted at a tertiary academic emergency ED in Germany with an integrated central point of assessment (CPA). The study included all consecutive adult walk-in encounters (>18 years) presenting between October 2023 and February 2024 (N = 16,107). Data were collected from a structured clinical decision support system capturing presenting complaints, vital signs, and triage decisions recorded by specialized nursing staff. Structured clinical variables routinely collected at triage, including presenting complaint categories (CEDIS-PCL), vital signs according to the ABCDE framework, and additional structured or free-text clinical information. Results The primary outcome was the agreement between LLM-predicted and nurse-assigned ESI levels measured using quadratic-weighted Cohen's k. Secondary outcomes included sectoral assignment agreement, misclassification patterns (over- and under-triage), calibration metrics, and output reproducibility. Quadratic-weighted k values ranged from 0.18 to 0.75 across models. Only a structured stepwise prompting strategy achieved substantial agreement (k_qw = 0.747), approaching reported human inter-rater reliability. Most models demonstrated moderate or lower agreement and systematic overconfidence, with expected calibration errors (ECE) based on verbalized confidence ranging from 0.099 to 0.355. Sectoral assignment agreement (i.e. ED vs. urgent care practice, UCP) was uniformly low (k < 0.30). Reproducibility testing revealed substantial variability in 23% of cases, indicating non-deterministic output behavior for clinically relevant decisions. Conclusions Current large language models demonstrate heterogeneous and generally limited performance in real-world emergency triage tasks. Structured algorithm-guided prompting appears more influential than model architecture or size. Before clinical implementation, improvements in calibration, reliability, and workflow integration are required, alongside regulatory-compliant validation in prospective clinical settings.
Leonard, S. A.; Dysart, K.; Callahan, A.; Siadat, S.; Zhang, J.; Handley, S. C.; Huybrechts, K. F.; Igbinosa, I.; Bateman, B. T.
Show abstract
Background: Epic Cosmos is a relatively new centralized electronic health record dataset with high potential utility in perinatal epidemiologic research. Objectives: The study objectives were to develop replicable steps to create longitudinal, linked maternal-infant cohorts in Cosmos, assess completeness of key variables, evaluate potential selection bias with restrictions for longitudinal healthcare encounters, and provide an example epidemiologic analysis. Methods: We created maternal-infant cohorts by starting with live births during 2023-2024 recorded in the BirthFact data table and joining with additional data tables as needed. We selected and created variables for perinatal characteristics, common comorbidities, and routinely measured vital signs and laboratory values, and assessed variable completeness. We sequentially restricted the birth cohort for maternal-infant linkage and longitudinal healthcare from first-trimester prenatal care encounter through infant follow-up care within 12 weeks post-discharge from birth hospitalization. Finally, we conducted an example analysis of the association between high systolic blood pressure in the first trimester ([≥]140 mm Hg) and later onset of preeclampsia among those with chronic hypertension. Results: The total linked birth cohort included 2,624,186 pregnancies. Completeness was >90% for most variables assessed but was 77% for racial and ethnic group and 76% for body mass index at delivery. Characteristics of the cohort were similar to those reported for the entire United States birth population based on birth certificate data, including similar regional and racial-ethnic composition. Longitudinal cohort restriction requiring linked records from first trimester prenatal care through infant follow-up care reduced the cohort size to 509,148 pregnancies. However, restriction had minimal effects on cohort characteristics. In the example analysis, high systolic blood pressure was associated with increased risk of preeclampsia among those with chronic hypertension (aRR: 1.26; 95% CI: 1.22, 1.30). Conclusions: This study provides a rigorous and reproducible approach to creating longitudinal, linked maternal-infant cohorts in Epic Cosmos and the analytical findings suggest high data quality and representativeness.